以前在外围防御游戏中的研究主要集中在完全可观察到的环境上,在该环境中,所有玩家都知道真正的玩家状态。但是,这对于实际实施而言是不现实的,因为捍卫者可能必须感知入侵者并估计其国家。在这项工作中,我们在照片真实的模拟器和现实世界中研究外围防御游戏,要求捍卫者从视力中估算入侵者状态。我们通过域随机化训练一个基于机器学习的系统,用于入侵者姿势检测,该系统汇总了多个视图,以减少状态估计错误并适应防御策略来解决此问题。我们新介绍性能指标来评估基于视觉的外围防御。通过广泛的实验,我们表明我们的方法改善了国家的估计,最终在两场比赛中的VS-1-Intruder游戏和2-Fefenders-VS-1-Intruder游戏中最终进行了外围防御性能。
translated by 谷歌翻译
在本文中,我们使用单个摄像头和惯性测量单元(IMU)以及相应的感知共识问题(即,所有观察者的独特性和相同的ID)来解决基于视觉的检测和跟踪多个航空车的问题。我们设计了几种基于视觉的分散贝叶斯多跟踪滤波策略,以解决视觉探测器算法获得的传入的未分类测量与跟踪剂之间的关联。我们根据团队中代理的数量在不同的操作条件以及可扩展性中比较它们的准确性。该分析提供了有关给定任务最合适的设计选择的有用见解。我们进一步表明,提出的感知和推理管道包括深度神经网络(DNN),因为视觉目标检测器是轻量级的,并且能够同时运行控制和计划,并在船上进行大小,重量和功率(交换)约束机器人。实验结果表明,在各种具有挑战性的情况(例如重闭)中,有效跟踪了多个无人机。
translated by 谷歌翻译
准确地建模四极管的系统动力学对于保证敏捷,安全和稳定的导航至关重要。该模型需要在多个飞行机制和操作条件下捕获系统行为,包括产生高度非线性效应的那些,例如空气动力和扭矩,转子相互作用或可能的系统配置修改。经典方法依靠手工制作的模型并努力概括和扩展以捕获这些效果。在本文中,我们介绍了一种新型的物理启发的时间卷积网络(PI-TCN)方法,用于学习四极管的系统动力学,纯粹是从机器人体验中学习的。我们的方法结合了稀疏时间卷积的表达力和密集的进料连接,以进行准确的系统预测。此外,物理限制嵌入了培训过程中,以促进网络对培训分布以外数据的概括功能。最后,我们设计了一种模型预测控制方法,该方法结合了学习的动力学,以完全利用学习范围的方式,以完全利用学习模型预测的准确闭环轨迹跟踪。实验结果表明,我们的方法可以准确地从数据中提取四四光动力学的结构,从而捕获对经典方法隐藏的效果。据我们所知,这是物理启发的深度学习成功地应用于时间卷积网络和系统识别任务,同时同时实现了预测性控制。
translated by 谷歌翻译
Autonomous Micro Aerial Vehicles are deployed for a variety tasks including surveillance and monitoring. Perching and staring allow the vehicle to monitor targets without flying, saving battery power and increasing the overall mission time without the need to frequently replace batteries. This paper addresses the Active Visual Perching (AVP) control problem to autonomously perch on inclined surfaces up to $90^\circ$. Our approach generates dynamically feasible trajectories to navigate and perch on a desired target location, while taking into account actuator and Field of View (FoV) constraints. By replanning in mid-flight, we take advantage of more accurate target localization increasing the perching maneuver's robustness to target localization or control errors. We leverage the Karush-Kuhn-Tucker (KKT) conditions to identify the compatibility between planning objectives and the visual sensing constraint during the planned maneuver. Furthermore, we experimentally identify the corresponding boundary conditions that maximizes the spatio-temporal target visibility during the perching maneuver. The proposed approach works on-board in real-time with significant computational constraints relying exclusively on cameras and an Inertial Measurement Unit (IMU). Experimental results validate the proposed approach and shows the higher success rate as well as increased target interception precision and accuracy with respect to a one-shot planning approach, while still retaining aggressive capabilities with flight envelopes that include large excursions from the hover position on inclined surfaces up to 90$^\circ$, angular speeds up to 750~deg/s, and accelerations up to 10~m/s$^2$.
translated by 谷歌翻译
与单个机器人通过在代理商中实现合作,可以自然适用于额外的灵活性,弹性和稳健性,以提供额外的灵活性,弹性和鲁棒性。为了提高自主机器人决策过程和情境感知,多机器人系统必须以有效和有意义的方式协调他们在代理中收集,共享和融合环境信息,以便准确地获得适当的上下文信息或增强传感器噪音或故障的能力。在本文中,我们提出了一种通用图形神经网络(GNN),主要目标是增加,以多机器人感知任务,单一机器人的推论感知精度以及传感器故障和干扰的弹性。我们表明,所提出的框架可以解决多视觉视觉感知问题,例如单眼深度估计和语义分割。使用从多个空中机器人的观点收集的照片 - 现实和真实数据的几个实验表明了提出的方法在具有挑战性的推理条件下的有效性,包括由大噪声和摄像机闭塞或故障损坏的图像。
translated by 谷歌翻译
Background: Image analysis applications in digital pathology include various methods for segmenting regions of interest. Their identification is one of the most complex steps, and therefore of great interest for the study of robust methods that do not necessarily rely on a machine learning (ML) approach. Method: A fully automatic and optimized segmentation process for different datasets is a prerequisite for classifying and diagnosing Indirect ImmunoFluorescence (IIF) raw data. This study describes a deterministic computational neuroscience approach for identifying cells and nuclei. It is far from the conventional neural network approach, but it is equivalent to their quantitative and qualitative performance, and it is also solid to adversative noise. The method is robust, based on formally correct functions, and does not suffer from tuning on specific data sets. Results: This work demonstrates the robustness of the method against the variability of parameters, such as image size, mode, and signal-to-noise ratio. We validated the method on two datasets (Neuroblastoma and NucleusSegData) using images annotated by independent medical doctors. Conclusions: The definition of deterministic and formally correct methods, from a functional to a structural point of view, guarantees the achievement of optimized and functionally correct results. The excellent performance of our deterministic method (NeuronalAlg) to segment cells and nuclei from fluorescence images was measured with quantitative indicators and compared with those achieved by three published ML approaches.
translated by 谷歌翻译
The broad usage of mobile devices nowadays, the sensitiveness of the information contained in them, and the shortcomings of current mobile user authentication methods are calling for novel, secure, and unobtrusive solutions to verify the users' identity. In this article, we propose TypeFormer, a novel Transformer architecture to model free-text keystroke dynamics performed on mobile devices for the purpose of user authentication. The proposed model consists in Temporal and Channel Modules enclosing two Long Short-Term Memory (LSTM) recurrent layers, Gaussian Range Encoding (GRE), a multi-head Self-Attention mechanism, and a Block-Recurrent structure. Experimenting on one of the largest public databases to date, the Aalto mobile keystroke database, TypeFormer outperforms current state-of-the-art systems achieving Equal Error Rate (EER) values of 3.25% using only 5 enrolment sessions of 50 keystrokes each. In such way, we contribute to reducing the traditional performance gap of the challenging mobile free-text scenario with respect to its desktop and fixed-text counterparts. Additionally, we analyse the behaviour of the model with different experimental configurations such as the length of the keystroke sequences and the amount of enrolment sessions, showing margin for improvement with more enrolment data. Finally, a cross-database evaluation is carried out, demonstrating the robustness of the features extracted by TypeFormer in comparison with existing approaches.
translated by 谷歌翻译
Digital media have enabled the access to unprecedented literary knowledge. Authors, readers, and scholars are now able to discover and share an increasing amount of information about books and their authors. Notwithstanding, digital archives are still unbalanced: writers from non-Western countries are less represented, and such a condition leads to the perpetration of old forms of discrimination. In this paper, we present the Under-Represented Writers Knowledge Graph (URW-KG), a resource designed to explore and possibly amend this lack of representation by gathering and mapping information about works and authors from Wikidata and three other sources: Open Library, Goodreads, and Google Books. The experiments based on KG embeddings showed that the integrated information encoded in the graph allows scholars and users to be more easily exposed to non-Western literary works and authors with respect to Wikidata alone. This opens to the development of fairer and effective tools for author discovery and exploration.
translated by 谷歌翻译
Content-Controllable Summarization generates summaries focused on the given controlling signals. Due to the lack of large-scale training corpora for the task, we propose a plug-and-play module RelAttn to adapt any general summarizers to the content-controllable summarization task. RelAttn first identifies the relevant content in the source documents, and then makes the model attend to the right context by directly steering the attention weight. We further apply an unsupervised online adaptive parameter searching algorithm to determine the degree of control in the zero-shot setting, while such parameters are learned in the few-shot setting. By applying the module to three backbone summarization models, experiments show that our method effectively improves all the summarizers, and outperforms the prefix-based method and a widely used plug-and-play model in both zero- and few-shot settings. Tellingly, more benefit is observed in the scenarios when more control is needed.
translated by 谷歌翻译
Anticipating future actions based on video observations is an important task in video understanding, which would be useful for some precautionary systems that require response time to react before an event occurs. Since the input in action anticipation is only pre-action frames, models do not have enough information about the target action; moreover, similar pre-action frames may lead to different futures. Consequently, any solution using existing action recognition models can only be suboptimal. Recently, researchers have proposed using a longer video context to remedy the insufficient information in pre-action intervals, as well as the self-attention to query past relevant moments to address the anticipation problem. However, the indirect use of video input features as the query might be inefficient, as it only serves as the proxy to the anticipation goal. To this end, we propose an inductive attention model, which transparently uses prior prediction as the query to derive the anticipation result by induction from past experience. Our method naturally considers the uncertainty of multiple futures via the many-to-many association. On the large-scale egocentric video datasets, our model not only shows consistently better performance than state of the art using the same backbone, and is competitive to the methods that employ a stronger backbone, but also superior efficiency in less model parameters.
translated by 谷歌翻译